Semantic PDF Segmentation for Legacy Documents in Technical Documentation
نویسندگان
چکیده
منابع مشابه
Semantic Indexing of Technical Documentation
This research takes place in an industrial context: the CONTINEW Company. This company ensures the storage and security of critical data and technical documentation. Consequently, it is necessary to organize these documents in order to retrieve quickly critical information. The management of this increasing volume of documents requires document classification which is based on indexing techniqu...
متن کاملSemantic Web Technologies in Technical Automotive Documentation
RDF is the format of choice to exchange data between software components of a corporate system. That’s why we decided to use it in a recent work at Renault, in the field of technical documentation. The prototype of a new repository for repair and diagnostic information was modeled with OWL. REST web services using RDF as data format were built on this repository, to provide access to improved r...
متن کاملLearning Semantic Correspondences in Technical Documentation
We consider the problem of translating high-level textual descriptions to formal representations in technical documentation as part of an effort to model the meaning of such documentation. We focus specifically on the problem of learning translational correspondences between text descriptions and grounded representations in the target documentation, such as formal representation of functions or...
متن کاملReconstructing Semantic Structures in Technical Documentation with Vector Space Classification
With the increasing popularity of component content management systems, a large part of technical documentation in manufacturing and mechanical engineering is written semantically structured in xml-based information models. Content delivery portals can utilize these information to provide users with advanced retrieval or filtering functions. However, legacy content is often excluded from such g...
متن کاملLayout and Content Extraction for PDF Documents
Portable document format (PDF) is a common output format for electronic documents. Most PDF documents are untagged and do not have basic high-level document logical structural information, which makes the reuse or modification of the documents difficult. We developed techniques that identified logical components on a PDF document page. The outlines, style attributes and the contents of the logi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Procedia Computer Science
سال: 2018
ISSN: 1877-0509
DOI: 10.1016/j.procs.2018.09.006